Robust Python: a review

Robust Python by Patrick Viafore establishes a narrative of good practices for writing robust pythonic code. Over the course of 24 chapters, 348 pages, Patrick advocates a case for what writing “Robust” Python entails. The book adapts the entries in the Merriam-Webster dictionary for robustness which are:

Having or exhibiting strength or vigorous health
having or showing vigor, strength, or firmness
strongly formed or constructed
capable of performing without failure under a wide range of conditions

Healthy systems can stand the test of time and meet expectations for a long time, code should exhibit strength meaning it should be obvious the code will hold up over time, strongly constructed code is built upon solid foundations, finally, and most importantly, we want a system that is capable of performing without failure. Code is constantly evolving and strong code bases are extensible, maintainable, and flexible. A single change should not be hard and it should not cause issues.

A robust codebase is resilient and error-free in spite of constant change

a robust codebase is a maintainable code base, and if a code base is maintainable it can be changed rapidly; evolved to meet business needs. In order to be maintainable, code must be communicated clearly. Developers must speak a similar language, a dialect if you will, so that they can make changes without fear of breaking things.

This is where the book begins to discuss intent. When writing code, we want the code to easily convey the intent. When someone looks at this code for the first time, can they figure out it’s intent? Do I understand what action this function is performing? Code that is robust is not code that is rigid and unchanged, but ever changing and extending.

In the same spirit as the author’s example in the book for adjusting recipes for a different serving amount, I’ve contrived an example (albeit probably not as good as the author’s example) of code that is hard to uderstand:

    #This function takes a person's name and a list of books with book name,    
    #author name, and ID in first, second, and third indices, repeated  
    #indefinitely. This function will check if books can be checked out and 
    #then register them as being borrowed and return the return date and which  
    #books have been borrowed    
    def check_out(borrower_name, books):
        user = getUser(borrower_name)
        for i in range(0,len(books), 3):
            can_lend, error = can_borrow(user, books[i:i+3])
            if not can_lend:
                print(f'Cannot lend {books[i]} because {error}')
                del book[i:i+3]
        lend_books(user, books)
        date = create_return_date(user)
        return date, books

This code is the antithesis of what the rest of book goes on to convey is robust python. As you can tell this code leaves you with several questions:

why is the loop stepping 3 at a time?
why is it popping elements from the list?
does it mean to remove values from books?

Unfortunately books is really confusing in this case. Books is a list of 3 elements for each book and then if the book can’t be checked out, it deletes the book from the list. All of this is suprising to a developer who is looking at this for first time and this adds to the cognitive load of a maintainer who many need to extend or fix this code.

Patrick makes the case that code should be understandable and reduce the cognitive load to communicate what it is doing without having seek help from an external asynchronous source. This will speed up the maintainers job and reduce their cognitive burden.

Types

types have a mechanical representation (i.e. how the type is actually represented in memory and also the mechanics of how operations are performed) and they also have a semantic representation (i.e. they convey meaning to developers about what ideas this type encodes at a level above the mechanical representation and also they encode what operations can be performed with these types) Datetime has both a mechanical and semantic representation, much like a vector or a string.

The type you choose actually convey’s alot of intent. For instance, the type of collection you choose encodes a lot of meaning and sets up the reader of the code with certain expectations. Someone looking at the code will ask themselves certain questions a make certain assumptions based upon the types that the author uses. Is this code going to mutate? Is this code going to contain duplicates? Is it going to be iterated over or indexed into? and many more questions. Fortunately, Python offers a sizeable collection (pun intended). Here’s a list of some commonly used collection types:

List: a mutable sequence type that is often iterated over and rarely indexed into and it can contain duplicates
Set: a mutable sequence type that contains only unique element, the order of elements cannot be relied upon
Tuple: a immutable sequence that is often indexed into since it’s not expected to change
Dictionary: a sequence type that performs a mapping from “key”->“value”, this is used for dynamic lookup of values based on a key, as of Python 3.7 the dictionary’s order is reliable

Throughout the book, Patrick will point out code smells, which are cues in the code that something is being done improperly. Here he notes that static indexing can be a code smell for using a collection type incorrectly, such as getting recipe[0] or books[“Python”], instead one may want to consider using a custom type such as Enum or Dataclass which I will discussing later.

Python’s “Duck Typing” by nature, promotes a dynamic type system where any object that obides by a certain interface can be passed around. This is also known as Structural Typing. As Patrick states, Duck typing can be a double-edged sword: it can improve composability and allow for more general code without having to handle special edge cases; however, it can also also increase the complexity of the code and break assumptions for developers because now they must look at every place where something is called.

from typing import Iterable
def print_items(items: Iterable):
    for item in items: print(item)
        print_items([1,2,3])
        print_items({4, 5, 6})
        print_items({"A": 1, "B": 2, "C": 3})

Because each of the passed objects obides by the __iter__ interface each of these objects can be passed and used in this print_items function. The nice thing about Python’s data mode is that you can apply these dunder methods to almost any object and then can be used to comply with the structure for some operation. such as __contains__ and the in operator to check for membership and __str__ for string representation of objects. There are many of these and this is one way you can create structural typing with objects.

Type Annotations

in order to convey types to the developers, Python has now introduced type annotations to the language as of version 3.5. This means we can now annotate function signatures to better help developers convey their intent. we can now have function signatures that look like:

    def checkout(borrower_name: str, books: List) -> Tuple[date.datetime, List]

Type annotations are not enforced at run-time. A developer can pass any value into these parameters, including different types and Python will never complain as long as the code can use that different type. This is where static analyzers come in, programs such as mypy and pylance can perform static analysis on developer’s code and will throw errors if some type usage contract has been violated. This helps build confidence in the developer’s code knowing that they upheld the intents of the API’s that they used. This makes the code more Robust, the developer is confident in extending the codebase and that the right types were used.

Constraining Types

Constraining types are powerful because they can help take the representable state space of your code and help to reduce it down. Here are the constraining types:

Optional: used for any references to None
Literal: Limit the statespace to only the enumerated literal values
Union: This define a set of types that can be used/returned
Annotated: Use this to define additional constraints on your types
NewType: Use this to restrict the types to a specific usecase
Final: Prevents variables from being reassigned

Optional

Optional communicates that this value can be another type, or it can be None. If your code doesn’t handle None then it will crash when it tries to access it or use an attribute on it

import random
class Car():
    def __init__(self) -> None:
        self.drive = True

def create_car():
    if random.random() > .5:
        return None
    else: 
        return Car()

car = create_car()

print(car.drive)

here you can see if you don’t use a static analyzer, then the developer may not handle the case that this could be of None type. Here’s the error pylance produces: “drive” is not a known member of “None”

With a check for if car is None, Pylance will recognize the check and the error will go away. here, we should annotate optional typing to indicate that None could be returned:

import random
from typing import Optional

class Car():
    def __init__(self) -> None:
        self.drive = True


def create_car() -> Optional[Car]:
    if random.random() > .5:
        return None
    else: 
        return Car()

car = create_car()

if car:
    print(car.drive)

literal

the Literal type annotation restrains the state of the allowed values to specific literals:

from typing import Literal

def get_city(state) -> Literal["Dallas", "San Francisco", "Chicago"]:
    if state == "Illinois":
        return "Chicago"
    if state == "California":
        return "San Francisco"
    else:
        return "Dallas"

Pylance will throw an error, if the function returns something not defined in the literal. Here’s that error in Pylance:

from typing import Literal

def get_city(state) -> Literal["Dallas", "San Francisco", "Chicago"]:
    if state == "Illinois":
        return "Chicago"
    if state == "California":
        return "San Francisco"
    else:
        return "Seattle"

Expression of type "Literal['Seattle']" cannot be assigned to return type "Literal['Dallas', 'San Francisco', 'Chicago']"
  Type "Literal['Seattle']" cannot be assigned to type "Literal['Dallas', 'San Francisco', 'Chicago']"
    "Literal['Seattle']" cannot be assigned to type "Literal['Dallas']"
    "Literal['Seattle']" cannot be assigned to type "Literal['San Francisco']"
    "Literal['Seattle']" cannot be assigned to type "Literal['Chicago']"

Union

Union’s are very powerful because they can restrain the representable state space from product of potential states to a sum of different states. Meaning we can restrain all possible states returned by the function. We can break apart return values of a function by using union type. As the book demonstrates, we can create natural separation of concerns by breaking up object naturally, such as:

@dataclass
class BorrowRecord:
    return_data: Date
    borrower: str
    books: List[Books]
    error: int
    error_ocurred: bool

def lend_books(books: List[Book], user) -> BorrowRecord:
    lend_books[]
    for book in books:
       if check_book(book):
           lend_books.append(book)
    for book in lend_books:
        if not checkout(book):
            return Error(error=f"Failed to checkout {book}")
    return BorrowRecord(return_date=datetime.date+7, borrower=user.name, lend_books, error=None, error_ocurred=False)

Imagine we have a possible return date date of 1 or 2 weeks, 1 of 10 borrowers registered at this library, 3 books max, error codes 1,2,3,4,5; and 2 if error code is set. This creates a representation space of 2 x 10 x 3 x 5 x 2 = 600, whereas if we used a union type we could break this down to 2 x 10 x 3 = 60 or 5 x 2 = 10. Quite a sizeable reduction in output space representation:

@dataclass
class Error:
    error: str

@dataclass
class BorrowRecord:
    return_date: date
    borrower: str
    books: List[Book]

def lend_books(books: List[Book], user) -> Union[Error,BorrowRecord]:
    lend_books[]
    for book in books:
       if check_book(book):
           lend_books.append(book)
    for book in lend_books:
        if not checkout(book):
            return Error(error=f"Failed to checkout {book}")
    return BorrowRecord(return_date=datetime.date+7, borrower=user.name, lend_books, error=None, error_ocurred=False)

Annotated

Annotated types are just meta data provided about certain types, such as applying certain restrictions to base types like forcing a variable to match a regex or specifying an integer must be between 1 and 100, or to provide some other description of the data. At the time of writing the book, it doesn’t seem that Annotated types have any way of being enforced by static analyzer. This is something I will have to look further into.

NewType

NewTypes are kind of like type aliases, but instead being interchangeable, you cannot pass an original type when it expects a NewType. For example if we have a rentable type, and we create a NewType from that such as DVD. we can have a DVD object that has all the same properties as rentable, but we can only pass a DVD to functions that specifically call for it.

from typing import NewType
class Rentable:
    def __init__(self,):
        pass

DVD = NewType(DVD, Rentable)


def rent_movie(movie: DVD):
    print()

book: Rentable = Rentable()
rent_movie(book)

This gives the following error if you try to pass a rentable:

Argument of type “Rentable” cannot be assigned to parameter “movie” of type “DVD” in function “rent_movie” “type[type]” is incompatible with “type[DVD]”

Final

final just ensures that a type cannot change it’s value.

from typing import Final

NAME: Final = "Library of Congress"

NAME = "Library of Alexandria"

This produces the following error:

“NAME” is declared as Final and cannot be reassigned

Setting a variable as Final ensures that it remains immutable throughout the codebase, this can be useful for a large codebase.

Collection Types

So far we we’ve discussed constraining types, types that limit the representable state space and therefore help improve the developer’s understanding of the code base and narrows down the allowed state of the code. Now we move on to type checking Collections. Collections are datastructures where we can have a sequence of values e.g. lists, tuples, dictionaries, sets, etc. I’ve listed the primary builtin collections types and there are many more for use in all kinds of scenarios.

Type annotating collection types is no different from other types really:

def checkout_books(books: List[Book]) -> Union[Error,BorrowRecord]:
   ...

def recommend_books_by_genre(user: User) -> Dict[str, Book]
   ...

The issue with collections is that we must annotate the collection itself; in addition, to the types those collections contain. It’s easy if the collection only contains one type, known as a homogeneous collection, but if it starts to contain more than a single type, also known as a hetergeneous collection, it can become cumbersome. It’s generally easier to reason about homogeneous collections, you don’t need to do an special type inspection if your list contains only a sequence of one type, the more types the collection contains, the more special code needed to handle each type.

For example, what if we wanted to represent some information about a person at a public library by storing name, age, and date when they received their library card:

from datetime import date

user: Tuple[str, int, date] = ("Chandler Staggs", 30, 'datetime.date(2023, 11, 1)')

And if we want to access a specific attribute, we can supply the index e.g. user[0]. However, having to remember the specific index for which attribute is cumbersome, especially as we add more attributes. A much easier way is to just ask for the name, age, start date. Such a mapping can be provided by a dictionary:

from typing import Dict, Union
from datetime import date

user: Dict[str: Union[str, int, date]] = {"name": "Chandler Staggs", "age": 30, "start_date": 'datetime.date(2023, 11, 1)'}

Now we can say user[‘name’] and retrieve the user’s name without having to specify the index. Another issue arrises though, now we still have annotate each new type in the heterogenous dict as their added and this also doesn’t tell us which mapping have what return types.

To overcome this, we can use TypedDict. A dictionary annotated with a TypedDict, will specify which fields are available, what their types are.

from typing import TypedDict
from datetime import date

class User(TypedDict):
    name: str
    age: int
    start_date: date

user: User  = {"name": "Chandler Staggs", "age": 30, "start_date": 'datetime.date(2023, 11, 1)'}

This is helpful when we can’t avoid heterogeneous collections, such as when we make an API call that returns some JSON object, or YAML, or some other data store format.

Generics If we want to make our own collections, then we can use Generics. Generics allow the passage of any specific type into our apparatus and they won’t get mixed up with other types. We can reference that type without having to know the actual type. for example, if we want to create a graph collection we could do the following:

from collections import defaultdict
from typing import Generic, TypeVar

Node = TypeVar("Node")
Edge = TypeVar("Edge")

class Graph(Generic[Node, Edge]):
    def __init__(self):
        self.edges: dict[Node, list[Edge]] = defaultdict(list)
    
    def add_edge(self, node: Node, edge: Edge):
        self.edges[node].append(edge)
    
    def get_edges(self, node: Node):
        return self.edges[node]

books: Graph[Book, Book] = Graph()

renters_to_books: Graph[User, Author] = Graph()

books.add_edge(Book("Robust Python", "Patrick Viafore"), Book("Clean Code", "Robert C. Martin"))

renters_to_books.add_edge(User("Chandler Staggs"), Author("Fyodor Dostoyevsky"))

This is similar to the example from the book.